Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use method and path as labels for http metrics #8271

Merged
merged 4 commits into from
Oct 2, 2020

Conversation

coignetp
Copy link
Contributor

@coignetp coignetp commented Jul 9, 2020

Hey!
I'm currently working on the consul integration at Datadog, and we want to scrape the prometheus endpoint to get consul metrics. In the current state several metrics are not consistently named, and contain parts that would be good as labels instead, and that makes scraping the endpoint and converting to Datadog metrics troublesome.
This PR use the method and the path as labels of the consul.http metric, and remove it from the metric name.

Example: consul_http_GET_v1_agent_metrics -> consul_http with labels method = GET, path = v1_agent_metrics.

This change is not backward compatible since the metric names will be different. Maybe you'd like to keep sending metrics like consul_http_GET_v1_agent_metrics as well.

Please tell me what you think about this change! 😄

P.S.: I also opened a PR for the raft module hashicorp/raft#409 for a similar change.

@coignetp coignetp marked this pull request as draft July 9, 2020 08:43
@coignetp coignetp marked this pull request as ready for review July 9, 2020 09:04
@dnephin dnephin added theme/telemetry Anything related to telemetry or observability type/enhancement Proposed improvement or new feature labels Jul 9, 2020
@dnephin
Copy link
Contributor

dnephin commented Jul 9, 2020

Hello, and thank you for contributing this pull request! I think the changes you propose are correct, but we will very likely need to keep the existing metrics for backwards compatibility. I think we can emit both metrics, and rely on the prefix_filter configuration option to allow someone to disable the one they don't want.

What I'm not sure about is if we should start by disabling the new labelled version by default and require someone to enable it in config (we can change the default in a future release), or if we should emit both by default.

Another thing we might want to do is change the name of the metric so that it is easier to include or exclude using a prefix. I suspect a filter of -consul.http would exclude both the new and old one. Maybe the new metric could have a name like consul.api.http to make it easier to distinguish them from the filter?

Copy link
Contributor Author

@coignetp coignetp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your quick response! I updated the PR to send both metrics, and I changed the new metric name to consul.api.http.

Also, should I add -consul.api.http in the default prefix_filter option then? I guess the change should be here

telemetry = {
metrics_prefix = "consul"
filter_default = true
but I'm not sure about that

@@ -177,7 +177,7 @@ This is a full list of metrics emitted by Consul.
| `consul.dns.stale_queries` | This increments when an agent serves a query within the allowed stale threshold. | queries | counter |
| `consul.dns.ptr_query.` | This measures the time spent handling a reverse DNS query for the given node. | ms | timer |
| `consul.dns.domain_query.` | This measures the time spent handling a domain query for the given node. | ms | timer |
| `consul.http..` | This tracks how long it takes to service the given HTTP request for the given verb and path. Paths do not include details like service or key names, for these an underscore will be present as a placeholder (eg. `consul.http.GET.v1.kv._`) | ms | timer |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I leave this line in the documentation since both metrics are sent?

Copy link
Contributor Author

@coignetp coignetp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnephin I'm not sure I understand why the CI is failing. Any suggestion?

@@ -123,6 +123,7 @@ func DefaultSource() Source {
telemetry = {
metrics_prefix = "consul"
filter_default = true
prefix_filter = [ "-consul.api.http" ]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it here, please tell me if it should be added elsewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be correct. I'll need to double check. A comment explaining why this is disabled by default would be great. Something about it being a new metric and will be enabled by default in a future version.

@dnephin
Copy link
Contributor

dnephin commented Jul 22, 2020

Thanks! This is looking good

I think CI is failing because of the change to the default config source. The expected value will need to be updated to account for the new defaults.

@coignetp
Copy link
Contributor Author

Thanks @dnephin ! I updated the expected value in the test and added a small comment

@coignetp
Copy link
Contributor Author

coignetp commented Aug 3, 2020

Hi @dnephin not sure how this change affected a cache test here

@coignetp
Copy link
Contributor Author

Hi @dnephin, sorry to bother you again, but how is there any news here?

@mkcp
Copy link
Contributor

mkcp commented Oct 2, 2020

Hey @coignetp, sorry for all the delays on this and your Raft contribution (you can blame me for those)! Thank you for taking the time to write this out and keep up with its status. Great work overall!

I'm going to merge in your PR. After that I'll follow up with a PR of my own with a few changes to how we're handling these deprecated metrics. We want to add a warning for users when they're using the old metrics, and have the new consul.api.http metrics active by default. In order to do that, we'll be adding a new configuration flag and removing the entry in ignored prefixes.

This path has been something we've been discussing internally, so I apologize that that decision was not available when you picked up the initial issue. Even though we're landing it in a slightly different way, I want to make sure your commits still make it in. Again, thank you for submitting this PR. It may be small but you can see from how many issues it's going to close out that it's high impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/telemetry Anything related to telemetry or observability type/enhancement Proposed improvement or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants